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Scheduling mechanism is the process of allocating radio resources to User 
Equipment (UE) that transmits different flows at the same time. It is 
performed by the scheduling algorithm implemented in the Long Term 
Evolution base station, Evolved Node B. Normally, most of the proposed 
algorithms are not focusing on handling the real-time and non-real-time 
traffics simultaneously. Thus, UE with bad channel quality may starve due to 
no resources allocated for quite a long time. To solve the problems, 
Exponential Blind Equal Throughput (EXP-BET) algorithm is proposed. 
User with the highest priority metrics is allocated the resources firstly which 
is calculated using the EXP-BET metric equation. This study investigates the 
implementation of the EXP-BET scheduling algorithm on the FPGA 
platform. The metric equation of the EXP-BET is modelled and simulated 
using System Generator. This design has utilized only 10% of available 
resources on FPGA. Fixed numbers are used for all the input to the 


scheduler. The system verification is performed by simulating the hardware 
co-simulation for the metric value of the EXP-BET metric algorithm. The 
output from the hardware co-simulation showed that the metric values of 
EXP-BET produce similar results to the Simulink environment. Thus, the 
algorithm is ready for prototyping and Virtex-6 FPGA is chosen as the 
platform. 
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1. INTRODUCTION 

Scheduling algorithm is the method to allocate radio resources to user equipment (UE) [1]. The UE, 
for example mobile phone that transmit different flows such as web browsing or video streaming at the same 
time. The process of scheduling mechanism is based on scheduling algorithms implemented at the Long 
Term Evolution Standard (LTE) base station, Evolved Node B. The scheduling process is performed in the 
Medium Access Control (MAC) layer. Since the implementation of scheduling algorithm is an open issue in 
LTE, many scheduling algorithms have been proposed by the researchers [2], [3]. Previously, various 
scheduling algorithm which offered several techniques in handling resources to the users have been 
developed such as Frame Level Scheduler (FLS) [4], Modified Largest Weighted Delay First (MLWDF) [5], 
Proportional Fairness (PF) [6]. In general, many researchers have also suggested packet schedulers that 
allocate the resources to UEs by considering the channel quality conditions such as Best Channel Quality 
Indicator (BCQI) [7] and Maximum Rate [8]. In LTE, one of the important features is the scheduling 
algorithm. The algorithm itself will determine which packet bring the first priority to be scheduled. 

However, none of them have proposed the scheduling algorithm that consider both real-time flow 
such as video streaming, online gaming and non real-time flow such as web browser, email. The study in [9] 
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has proposed the EXP-BET algorithms. These algorithm consider both real time and non real time flows 
simultaneously. Based on the simulation results, the EXP-BET algorithm performance was better than the 
FLS and EXP-PF algorithms for the real-time services. For the non-real-time services, EXP-BET has shown 
a 17.72% improvement as compared to FLS and 7.52% for EXP-PF in fairness index. The authors conclude 
that, scheduling could be recommended as one of the methods to solve the problem of the cell edge users 
since EXP-BET algorithm gave a fair share of the system resources to users considering multiple services.. 

Field Programmable Gate Array (FPGA) was established by Xilinx Company It is developed based 
on the programmable logic devices (PLDs) and the logic cell array (LCA) concept. By providing a two- 
dimensional array of configurable logic blocks (CLBs) and programming the interconnection that connects 
the configurable resources, FPGA can implement a wide range of arithmetic and logic functions [10], [11]. 
The architecture is a reconfigurable logical device made up of an array of small logic blocks and allocated 
interconnection resources. FPGA has the advantages in terms of performance, cost, reliability, flexibility and 
time-to-market [12] as compared to other popular IC technologies such as application specific integrated 
circuits (ASICs) and digital signal processors (DSPs). 

In terms of FPGA implementation, none of the researchers have implemented the EXP-BET 
scheduling algorithm using the hardware platform. In 2015, the authors of paper [13] have focused on the 
implementation of various algorithms for an arbiter with low port density (8-bit) using FPGA platform. 
Round robin arbiter which led to strong fairness is selected and it works on the principle that a request that 
was just served should have the lowest priority on the next round of arbitration. 

Over the past few years, new software tools have been established by Xilinx Company for the 
development of the FPGA. Using Simulink as add on tool, they presented the System Generator that 
concedes the design of the hardware circuits configured with the Simulink environment. Furthermore, the 
combination of Xilinx System Generator and Simulink environment provides simple technique of the 
hardware design through the use of existing System Generator blocks and subsystems. This will save both the 
required design time and hardware implementation resources. Hence, the proposed algorithm is ready for 
commercialization as FPGA is faster to market. In FPGA, no layout, masks or other fabricating steps are 
needed and it is simpler to design as compared to ASIC [14]. The hardware implementation is important for 
designers of high-performance (Digital Signal Processing) DSP systems such as wireless networks. Hence, 
verification on a hardware is needed to validate the theoretical and simulation work. 

Therefore, this study aims to implement and verify the hardware simulation of EXP-BET algorithm 
using Xilinx System Generator (XSG). The algorithm is modelled using MATLAB Simulink which is 
configured with XSG. The paper is organized as follow: in Section 2, we describe the research method. 
Section 3 presents the results and discussion. Finally section 6 draws the conclusion. 


2. RESEARCH METHOD 

The proposed packet scheduling algorithm for the downlink transmission of LTE is the Exponential 
Rule and Blind Equal Throughput (EXP-BET) algorithms. The flowchart for the design of the EXP-BET 
algorithm is presented in Figure 1. The EXP Rule algorithm schedules the real-time services while the BET 
algorithm take cares of the non-real-time services and served the users based on the metrics equation (1-2). 


2.1. Exponential (EXP) rule 

The main idea behind the EXP Rule algorithm is to have fair treatment between throughput, 
fairness, and delay requirements for a scheduling algorithm. The EXP Rule gives higher priority to the user 
with the highest transmission delay or user that has more packets in its buffer. It is a channel-aware 
scheduling algorithm which considers the CQI metric in the scheduling decision [15] and has been proved to 
be the most promising approach for delay sensitive real-time applications such as video and VoIP. This is 
described by the metric of (1): 


aD yo i i 
Hol, Jer (1) 


i Sam 


Where a; is the tuneable parameter which is equal to 5/0.991;, t; is the the tolerable time interval 
within which the packet must receive, Dyoz is the delay of the first packet to be transmitted by the i® user, 
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2.2. Blind equal throughput 
Fairness can be achieved with Blind Equal Throughput (BET) which stores the past average 
throughput achieved by each user. The metric (for the i® user) is calculated as: 


Mya = R(t) (2) 


Where Ri(t) is equal to BRi(t-1) + (1-B) ri(t), B is the weigh factor for moving average B(1<P<0), Ri(t- 
1) is the past average throughput of the user at time t-1, ri(t) is the achievable data rate for user i at time t". 


Design the EXP-BET Algorithm using Simulink Blocks 
Test the System under Simulink 


Design 
Verified? 


Design 
Verified? 
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Timing Analysis 


Timing 
Verified? 


Design 
Verified? 


Figure 1. EXP-BET design flow 


The EXP-BET algorithm is modelled using the Xilinx Blockset. The Xilinx Blockset library 
contains all the basic blocks such as adders, multipliers, registers and memories for the specific design. The 
algorithms are developed and models are created for all the mathematical operation for the EXP-BET 
metric’s computation using library provided by Xilinx Block set. 

To implement the EXP-BET algorithm into FPGA, MATLAB Simulink [16], and Xilinx system 
generator tools need to be configured. In the Simulink environment, the FPGA boundary is defined in the 
Gateway In and Gateway Out blocks where the input and output for the FPGA is fed into the Gateway In and 
the output is produced from the Gateway Out port. These ports interface the Simulink double data type and 
the FPGA fixed point environments. In the Gateway In block, the Simulink floating point input is converted 
to a fixed point format, saturation and rounding modes. These parameters are defined by the designer. The 
system output which is generated by the Gateway Out port converts the FPGA fixed point format to Simulink 
double numerical precision floating point format. 

Hence, the system is simulated, tested and verified by examining the results which is generated on 
the display port from the Simulink source library. To validate the designed model in Simulink, timing 
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analysis is used. Timing analysis is represented with delay parameter and it is used for verification of 
Simulink environment design. This verifies the functionality of the system model generated using the XSG 
and Simulink. The next step is to set up the system generator for the hardware Co-simulation. In fact, the 
hardware Co-simulation is one of the techniques provided by the system generator to transform the model 
built in Simulink environment into hardware. The XSG can be used with different types of FPGA boards and 
provide few other options for clock speed, compilation type and analysis. FPGA board used for the 
implementation of EXP-BET algorithm is Virtex-6 xc6vlx240t-1ff1156. 

Lastly, the FPGA is compiled using bitstream programming file (BIT) that is automatically 
generated by the System Generator during Hardware Co-Simulation. After the generated bit file is 
downloaded onto the FPGA, the input to the device is fed from Simulink’s source block and the device 
output is received back in Simulink’s sinks block. This enables wide-ranging testing as the data from the 
FPGA can be directly transferred to the MATLAB environment. After the hardware Co-simulation is 
completed, the results can be seen on a display sink blocks from the Simulink library. If the output is similar 
to the Simulink environment’s output, then the algorithm is confimed to be successfully prototyped. The 
Xilinx blockset used in the design is presented in Figure 2. 


System CORDIC Resource IO gateway AddSub block Constant block 
Generator block estimator et 
i i = 
= B 
Square root CMult block Divide block Convert block Simulink Simulink 
block ; Display block Constant block 
> B = Tmi 
=o 2 


Figure 2. Xilinx blockset used in the simulink design [17] 


3. RESULTS AND ANALYSIS 
This section discusses on the results of simulating the EXP-BET metric equation in the System 
Generator. The results obtained are then verified using hardware co-simulation. 


3.1. Simulating the EXP-BET algorithm using system generator 

Firstly, the design of EXP-BET is verified through rate and type propagation using the System 
Generator block. Ifa signal carrying floating-point data is connected to the port of a System Generator block 
that does not support the floating-point data type, error will be detected. The rate and propogation type for 
EXP and BET algorithms are illustrated in Figure 3 and Figure 4. 


Figure 3. Rate and type propagation for BET algorithm 
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Figure 4. Rate and type propagation for EXP rule algorithm 


3.2. Timing analysis 

Timing is very important when the designer is working with hardware description language. 
Hardware language involves simultaneous execution of process which means it runs in parallel manner. The 
System Generator provides a timing analysis tool named the timing analyzer to assist the timing analysis of 
the hardware design. Timing analyzer provides a report on slow paths and clearly displays the paths that 
failed on hardware. The System Generator block gives three options of clock frequency which are 100 MHz, 
50 MHz or 33.3MHz [15] for the Xilinx ML605 board. To start off, 50 MHz of clock frequency is selected 
which means that the system should operate within 20 ns of FPGA clock period. The formula for the 
calculation of clock period is: 


(3) 


T= 
if 
where fis the frequency. 
It is observed that the EXP system is failed to generate the hardware co-simulation and the total path 
delay is 112.64 ns which is obviously higher than 20 ns of clock period as shown in Figure 5. The timing 
analyzer in Figure 6 is detailing on the failed path of the EXP system and will automatically highlighting the 
blockset of the EXP system as shown in Figure 7 when the cursor is pointed on to one of the listing as in 
Figure 6. The failing path shows that timing violations have occurred and the input from one synchronous 
output stage does not reach the input of the next stage within the required time by the system design. 
As observed in Figure 7, the timing failed for the paths of divide, square root and CORDIC 4.0. Henceforth, 
the failing paths need to be optimized. 


Distribution of Total Path Delay 


112.64 


Delay (ns) 


Figure 5. Histogram for EXP system failing path (50 MHz) 
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Figure 7. EXP rule algorithm failing path (50 MHz) 


The slow path for each block is optimized using pipelining method since the hardware operation is 
working in parallel manner. Thus, the calculation is split up into multiple cycles. For example, the addition 
operation needs to wait for the division operation that takes much iteration to produce output. Thus, the 
latency is added to the addition operation as to wait for the division operation. One of the ways that can be 
used to address the problem as aforementioned is by implementing the pipelining method. This can be done 
by adding register or delay stages requirements during synthesis and tries to generate hardware co-simulation 
as to meet the requirement. 

In this research, the latency of the individual block is added throughout the design as tabulated in 
Table 1. Latency or clock period is the number of cycles required for the system to accept the next input. 
For example, if the design needs to accept new input and requires 10 cycles to propagate from input to 
output, thus, it means that the latency is 10. Thus, to address the problems as in Figure 6 to Figure 8, the 
clock frequency should be set to the minimum which is 33.333 MHz. If the clock frequency is at a slower 
rate, then the timing constraint will be much easier to accomplish. Table 2 shows the frequency and FPGA 
clock period for the EXP-BET system before and after optimization process. 


Table 1. Latency Before and After Table 2. Frequency and FPGA Clock Period Before and 
Optimization After Optimization 
Blockset Latency Parameters Before After 
Before After Clock Period/Clock Rate (ns) 20 30 
Divide 0 19 Frequency (MHz) 50 33.333 
Square Root 0 17 
Multiply 3 3 
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The optimized EXP-BET system is simulated once again and achieves all the timing constraints. 
The EXP-BET system is successfully verified in the hardware co-simulation when the output of bitstream is 
successfully generated after the compilation stage. The hardware co-simulation is considered fail when the 
timing constraint is violated. 

Figure 8 and Figure 9 illustrate the histogram for EXP-BET path delay after the system is being 
optimized. The Histogram Charts of 150 paths delay distribution are behaviourally generated via the Xilinx 
timing analyzer targeting the Virtex-6 FPGA board. Each histogram chart is a useful metric to analyze the 
FPGA implementation of EXP-BET and grouping 150 paths into regions of roughly formed normal 
distribution cluster due to different portions of the system generator architectures, or from different timing 
clock region constraints. The numbers at the top of the bins indicate the number of slow paths. The improved 
parameterized FPGA implementation can be adjusted so that all signals are completely routed, and all timing 
constraints are met. 


Distribution of Total Path Delay 


Figure 8. Histogram for BET path delay (33.333 MHz) 


Distribution of Total Path Delay 
108 


Figure 9. Histogram for EXP rule path delay (33.333 MHz) 


The histogram charts of Figure 8 and Figure 9 shows the BET and EXP Rule path delay operate 
within 30 ns of clock period (33.333MHz) and meet the timing constraints. As illustrated in Figure 8 and 
Figure 9, majority of the slow paths for BET occurred at 25.06 ns whereas for EXP, the slowest path is 
observed at 29.65 ns respectively. Therefore, it can be concluded that the EXP-BET system is able to run on 
the FPGA board within 30 ns of clock period. 
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3.3 Power analysis 

Xilinx constantly innovates to make sure the power challenges associated with shrinking 
technologies can be overcome. Xilinx understands that FPGA power consumption is one of the biggest 
concerns of FPGA users. Xilinx Power Tools help to perform power estimation and analysis for a given 
design. Power estimation and analysis become even more important as FPGAs increase in logic capacity and 
performance by migrating to smaller process geometries [18]. The Xilinx Power Analyser (XPA) is used to 
analyze the power consumption of the design which depends on the family of the device used, clock, logic, 
signal, I/Os and leakage power. Table 3 shows the estimated power consumption for EXP-BET system. The 
designed architecture uses a total power of 3.472 Watt and 3.437 Watt for EXP-BET respectively. As a 
conclusion, this power shows minimum consumption of Virtex-6 FPGA. It is being proved that, current 
FPGA technology such as Virtex-6 gives low power consumption and operates at maximum 
performance [19]. 


Table 3. EXP-BET Power Analysis 


Parameter Power (W) Used Available Utilization (%) 
EXP BET EXP BET EXP BET EXP BET 

Clocks 0.010 0.003 1 1 - - - 

Logic 0.020 0.004 6288 1030 150720 150720 4 1 

Signals 0.011 0.002 8647 1529 - - - - 

DSP’s 0.000 0.000 6 2 600 768 1 0 

TOs 0.008 0.006 129 97 600 600 22 16 
Leakage 3.423 3.422 
Total 3.472 3.437 


3.4 Design summary for device utilization 

The EXP-BET was implemented in an XC6VLX240 FPGA. The flexibility of the Virtex6 FPGA is 
realized in the slice resources. Each slice is composed of two 6 input look-up tables (LUTs) and associated 
logic. The slices are laid out in an array-like structure and each can be reconfigured to form larger complex 
systems. FPGA logic design is controlled at the bit level, giving the user the power to decide what resources 
to use, placement of the design in hardware and the maximum sustainable clock frequency. Table 4 shows 
the device utilization summary for EXP-BET system. The maximum operating frequency and power 
utilization along with the resource utilization before and after the optimization stage in the critical path are 
included. 


Table 4. Design Utilzation Summary 


EXP Rule BET 
System Optimization 

Before After Before After 
FFs 105 2122 (1%) 105 337 (1%) 
LUTs 5,931 6288 (4%) 1644 1030 (1%) 
Slices 1,765 1,967(5%) 610 309 (1%) 
LUT-FF pairs 97 1631 (24%) 154 255 (23%) 

Number of DSP48E1s 6 6 (1%) 3 3 (1%) 

Maximum Operating Frequency (MHz) 
Clock Period (ns) 112.996 29.970 59.243 25.354 


The FPGA framework is the fundamental structure of the logic device, which consist of Flip-flops 
(FFs), Look Up Tables (LUTs) and Slices. The IPs hard cores are DSP48E1 [20]. Each Virtex-6 FPGA slice 
contains four LUTs and eight FFs. Only some slices can use their LUTs as distributed RAM. Each slice has 
one set of clock, clock enable, and set/reset signals that are common to both logic cells. According to the 
simulation reports (refer Appendix), the BET system requires just 3% of the logic resources in FPGA; LUTs 
(1%), FFs (1%) and Slices (1%). Whereas, for EXP Rule system require 10% of the logic resources in the 
FPGA. It is composed of LUTs (4%), FFs (1%) and Slices (5%). A LUT Flip Flop pair for this architecture 
represents one LUT paired with one Flip Flop within a slice. The clock rate of FPGA Virtex-6 family is 600 
MHz which is large enough to drive the whole system. 

According to the simulation results, the BET system took 0.209 ns to finalize the generation of the 
output. The EXP system took 0.246 ns to completely calculate the output. Since the latency is small, the 
EXP-BET system can generate output continuously because of the pipelined design of the system. Moreover, 
the pipelining design makes the delay of the clock net very small which is about 0.2 ns and improved the 
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system performance. Using Xilinx Power Analyzer as a power estimation tool, the total power is estimated 
depending on the device utilization, clock rate and device data model. 


3.4 Hardware co-simulation 

The final verification was completed by implementing the hardware co-simulation of the system 
which allows a system simulation to be run completely on FPGA, while showing the results in Simulink. By 
selecting the point-to-point Ethernet interface, a new hardware co-simulation block is automatically 
generated. This is the process of generation of the equivalent hardware, for the EXP-BET. The Virtex-6 
(xc6vlx240t-1ff1156) is used and with the help of XSG and Xilinx XFLOW, the equivalent hardware 
generated the programmable bit file as shown in Figure 10 and Figure 11. Table 5 shows the metric value of 
the EXP-BET algorithm generated using the Co-Simulation method using the fixed input values. 


Table 5. Output Produced by the Co-simulation Method 


Algorithm Parameters Input Output 
EXP Ti 0.01 10.16 
Drot 0.003 
AveragepHoL 0.03 
Net 10 
Tik 3 
BET B 0.1 0.1053 
r(t) 5 
R:(t-1) 10 


- 


Constanti 


double Point-to-point 
‘ Ethemet Gateway Out 


Constant 


oP 


Constant? 


Sate Point-to-point aa f 10.16 
10 Etnemet Gateway Out > 


Display 


exp_metic 
hacosm 


Constant? 


Figure 11. EXP rule hardware co-simulation model 
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The port names on the hardware co-simulation block which are Gateway In1 until Gateway In5 are 
matched to the port names on the original subsystem. The port types and rates also matched the original 
design. When a value is written to one of the block's Gateway input ports, the block sends the corresponding 
data to the appropriate location in hardware, the controller output (Gateway Out) from the hardware is read 
back into the Simulink module using the Ethernet interface, the output port converts the fixed data type into 
the Simulink format and fed into the model. 

The EXP-BET system has been simulated for the hardware simulation and has been successfully 
implemented on the FPGA. The output values for the EXP-BET system are 10.16 and 0.1053 respectively 
and representing the metric value of the LTE’s scheduling algorithm. The EXP-BET system is verified since 
the calculation of the metric values in Simulink environment produce similar results to the Hardware Co- 
simulation. The chosen device for prototyping is Virtex-6 FPGA, and the hardware description language is 
Verilog. A system is then generated for Integrated System Environment (ISE), which includes the files for 
the structural description of the system. 


4. CONCLUSION 

The implementation of EXP-BET scheduling algorithm on FPGA was presented in this paper. The 
EXP-BET is an algorithm which consists of the Exponential Rule (EXP Rule) and Blind Equal Throughput 
(BET). The work presented was designed and simulated using the Xilinx System Generator, Xilinx ISE 
Design Suite and MATLAB Simulink. This resulted in a mathematical modelling of the EXP-BET metric 
equation using System Generator blocks. The time requirement for path delay is 30 ns which means that the 
system is expected to run at a clock rate of 30. Otherwise, the system will not meet the constraint and cannot 
run on FPGA. The final verification of the design is conducted using Hardware Co-simulation approach. The 
Hardware Co-simulation is a process of generating the equivalent hardware in terms of bitstream for the 
EXP-BET algorithm. Then, the System Generator generated the bit file which is downloaded to Virtex-6 
FPGA. 

This study provides the design and implementation process of an FPGA based system using System 
Generator for a scheduling algorithm namely the EXP-BET algorithm. It can be used as a basis for the future 
work towards the application in LTE/LTE-A. In addition, a practical system could be established and 
implemented if the whole system of transmitting and receiving of the physical layer is established. The 
limitation of this research is that, there is no input signal that can be injected into the EXP-BET system on 
FPGA since the scheduling algorithm is located at the LTE MAC layer and the input is transmitted from the 
physical layer. Hence, the implementation must start from the physical layer to generate the input for the 
scheduler. Further study should therefore concentrate on the hardware implementation for the whole system 
which starts from the physical layer protocol. Thus, the results of the implemented EXP-BET algorithm can 
be analysed and validated in terms of QoS requirements such as throughput, delay and packet loss rate. 
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